Unsupervised Detection of Violent Content in Arabic Social Media

نویسندگان

  • Dhinaharan Nagamalai
  • Kareem E Abdelfatah
  • Gabriel Terejanu
چکیده

A monitoring system is proposed to detect violent content in Arabic social media. This is a new and challenging task due to the presence of various Arabic dialects in the social media and the non-violent context where violent words might be used. We proposed to use a probabilistic nonlinear dimensionality reduction technique called sparse Gaussian process latent variable model (SGPLVM) followed by k-means to separate violent from non-violent content. This framework does not require any labelled corpora for training. We show that violent and non-violent Arabic tweets are not separable using k-means in the original high dimensional space, however better results are achieved by clustering in low dimensional latent space of SGPLVM.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

Understanding and Discovering Deliberate Self-harm Content in Social Media

Studies suggest that self-harm users found it easier to discuss self-harm-related thoughts and behaviors using social media than in the physical world. Given the enormous and increasing volume of social media data, on-line self-harm content is likely to be buried rapidly by other normal content. To enable voices of self-harm users to be heard, it is important to distinguish self-harm content fr...

متن کامل

Towards a Corpus of Violence Acts in Arabic Social Media

In this paper we present a new corpus of Arabic tweets that mention some form of violent event, developed to support the automatic identification of human rights abuses and different violent acts. The dataset was manually labelled for seven classes of violence using crowdsourcing. Only tweets classified with a high degree of agreement were included in the final dataset.

متن کامل

Television and Video Game Violence: Age Differences and the Combined Effects of Passive and Interactive Violent Media

The present research examined the combined effects of violent video games and violent TV programs on third and sixth-grade boys’ thoughts and behavior. In individual sessions, demographic information about the children’s television viewing and video game playing habits was collected. Participants were exposed to one of six following media conditions for 15 minutes; a) watch a violent (wrestling...

متن کامل

User sentiment detection: a YouTube use case

In this paper we propose an unsupervised lexicon-based approach to detect the sentiment polarity of user comments in YouTube. Polarity detection in social media content is challenging not only because of the existing limitations in current sentiment dictionaries but also due to the informal linguistic styles used by users. Present dictionaries fail to capture the sentiments of community-created...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017